Some basics in Math and Probability
Some basic Math
Probability
A collection of things
Other examples
\(\{1,2,3\}\)
\(\{blue, grey, red\}\)
Elements
\(x \in A\) means that x belongs \(A\) (“is an element of \(A\)”)
Subsets
\(A \subseteq B\) means that all elements in \(A\) are also elements of \(B\)
Intervals:
\((a, b]\) is short for \(\{x \in \mathbb{R} | a< x \le b \}\) (This reads: the set of all \(x \in \mathbb{R}\) that satisfy the condition to the right of \(|\))
Analogously, we define \([a, b]\), \([a, b)\), \((a, b)\).
\(A=\{1,4,5\}\), \(B=\{4,5,6\}\)
Union
\(A\cup B= \{1,4,5,6\}\)
Intersection
\(A\cap B= \{4,5\}\)
\(A=\{1,4,5\}\), \(B=\{4,5,6\}\)
Complement
\(A^{\mathrm{C}}=\{ 2,3 \} \cup \{6,7,... \}\)
All elements (of \(U\)) that do not belong to \(A\). Here \(U\) is some universal set e.g. the natural numbers in our example.
Set difference
\(A \backslash B= \{1\}\)
All elements of \(A\) that do not belong to \(B\)
On average, 40 out of 100 patients die if untreated and 20 out of 100 die if they receive treatment A.
How many patients do we need to treat with A to save one life (on average)?
BTW: The number in question is called the Number Needed to Treat (NNT).
The short way of writing sums:
Given a sequence of values, say \(x_{1}=1,x_{2}=2, x_{3}=3\), do something with each of them (e.g. square them), and add together the results.
\[\sum_{i=1}^{3}x_{i}^2=1^2+2^2+3^2=1+4+9=14\]
We’ll get back to the formula in the title later
Given a sequence of values: \(x_{1}=1, x_{2}=4,x_{3}=6, x_{4}=3, x_{5}=4\)
What is the following sum? \[\sum_{i=3}^{5}x_{i}=?\]
What is the following sum? \[\sum_{i=1}^{3}(x_{i}^2-x_{i})=?\]
Similarly, a short way of writing products:
\[\prod_{i=1}^{3}x_{i}^2=1^2\times2^2\times3^2=1\times4\times9=36\]
Exponentiation means multiplying a number (called the base) repeatedly with itself: \[z^n=\underbrace{z \times z \times \ldots \times z}_{n\ \text{times}} \] As we all know \(10^3=1000\)
We can also exponentiate with the inverse \(n\) which means taking the \(n^{\text{th}}\)-root: \[z^{\frac{1}{n}}=\sqrt[n]{z}\] For example: \(1000^{\frac{1}{3}}=10\)
If \(z\) is a positive number we can generalise exponetiation, allowing the exponent to be any number between \(-\infty\) and \(\infty\). For any two such exponents \(a\) and \(b\), we have:
\[z^a \cdot z^b=z^{a+b}\] and \[\frac{z^a}{z^b}=z^{a-b}\] This is easy to see in following examples: \(10^3 \cdot 10^2=10^5\) and \(\frac{10^3}{10^2}=10^{3-2}=10^1=10\)
Based on these rules, what is
\[z^0 =?\]
Bacteria of a given strain divide every 20 minutes.
Assume that there are 10 bacteria in a culture medium.
Say you have CHF 1000 on your bank account and the bank pays an interest rate of 5% annually.
Suppose another bank pays a daily interest rate of \(\frac{1}{365} 5\%\):
As \(n \rightarrow \infty\) the term \((1+\frac{1}{n} 0.05)^{n}\) approaches Euler’s number \(e \approx\) 2.7182818.
This means that with a bank paying every second your balance would grow as \(1000 \cdot e^{0.05\cdot t}\).
A function converts an input to an output
A bit more formally, a function assigns every element set A (domain) to exactly on element of set B (codomain). A and B can be the same.
Example: the absolute value of a number \(x\): \[f(x)=|x| = \begin{cases} x, & \text{if}\ x \geq 0 \\ -x, & \text{if}\ x < 0 \end{cases}\]
Example: BMI is a function of height, \(h\) (in m), and weight, \(w\) (in kg): \[f(h,w)=w/h^{2}\]
nagualdesign, CC BY-SA 4.0, via Wikimedia Commons
General formula: \[a_{n}x^{n}+a_{n-1}x^{n-1}+\dotsb +a_{2}x^{2}+a_{1}x+a_{0}\]
Example \[f(x)=2x^2+5x+ 1\]
Here some more examples:
The derivative of a function \(f(x)\), denoted \(f'(x)\) or \(\frac{df}{dx}(x)\), is the slope (tangent) of \(f(x)\) at a given point \(x\).
Derivatives of polynomials are particularly easy. Every term in the sum is multiplied with the exponent and the exponent is reduced by 1. The last term can simply be dropped as it is a constant and has zero slope:
\[f(x)=2x^2+5x+ 1\]
\[f'(x)=\frac{df}{dx}=2\cdot2 x^1 + 1\cdot 5 x^0 =4x+5\]
At a minumum or maximum of a function the slope has to be 0:
\[f'(x)=4x+5 \stackrel{!}{=}0\] solving for \(x\) we get \(x_{min}=-1.25\).
If the second derivative \(f''(x)\) is positive (negative), as is the case here, we have minimum (maximum).
The integral of \(f(x)\) with respect to \(x\) on an interval \([a,b]\) is the “area under the curve” between \(a\) and \(b\). We write:
\[\int_a^b f(x)dx\]
KSmrq, via Wikimedia Commons
\[f(x)=\text{exp}(x)=e^x\] The exponential function is the only function for which \(f'(x)=f(x)\), i.e. the slope equals its value at every point \(x\).
Remember the bank paying an annual interest rate of \(5\%\). Your initial balance was CHF 1,000.
How long will it take for your money to double?
Between 14 and 15 years (if the money increased continuously throughout the year).
The logarithm of \(x\) (must be > 0) to the base \(b\), denoted \(\log_b(x)\), is the number with which \(b\) must be exponentiated to obtain \(x\), i.e.
\[b^y=x \iff y=\log_b(x)\]
In our example, we were looking for \(\log_{1.05}(2)=14.2067\).
Logarithms inherit their rules from exponentiation
\[\begin{align} b^{y_1}\cdot b^{y_2} =b^{y_1 + y_2} &\implies \log_b(x_1 \cdot x_2)=\log_b(x_1)+\log_b(x_2) \\ \frac{b^{y_1}}{b^{y_2}} =b^{y_1 - y_2} &\implies \log_b\left(\frac{x_1}{x_2}\right) = \log_b(x_1)-\log_b(x_2) \\ (b^{y})^p =b^{y\cdot p} &\implies\log_b\left(x^p \right)=p\log_b(x) \end{align}\]
Logarithms inherit their rules from exponentiation
\[\begin{align} \log_{10}(10 \cdot 100) &=\log_{10}(10)+\log_{10}(100)=1+2=3 \\ \log_{10}\left(\frac{100}{10}\right) &= \log_{10}(100)-\log_{10}(10) = 3-1=1 \\ \log_{10}\left(10^2 \right) &= 2\log_{10}(10) = 2 \cdot 1=2 \end{align}\]
Calculate:
\[\log_2(8)=?\] \[\log_5(25^3)=?\]
The natural logarithm uses the base \(e\) and is denoted \(\ln(x)\):
\[x=\exp(y)=e^y \iff y=\ln(x)\]
The function \(f(x)=\ln(x)\) is therefore the inverse of the exponential function:
Mathematically, we conceptualize probabilities using functions from event spaces to the interval \([0,1]\)
Example - Tossing a coin:
Zyggystar, via Wikimedia Commons
Sample space \(\Omega=\{1,2,3,4,5,6\}\)
Every subset of \(\Omega\) is an event:
For a fair die (all sides equally probable), the probability of an event is simply the number of elements it comprises divided by the number elements in \(\Omega\):
Sascha Lill 95, Wikimedia Commons
From these it follows (exercise) that:
Here \(A^{\mathrm{C}}\) is the event that \(A\) does not occur, also denoted \(\overline{A}\) or \(\neg{A}\).
We refer to \(P(A|B)=\frac{P(A \text{ and } B)}{P(B)}\) as the conditional probability of A given B
Mr and Mrs Smith have two children one of which is a boy.
What is the probability that the other one is a girl?
Two events \(A\) and \(B\) are independent if
\[P(A \text{ und } B)=P(A )\cdot P(B)\]
It follows that \(P(A|B)=P(A)\) and \(P(B|A)=P(B)\).
Example rolling two dice:
\[ P(A\mid B)=\frac {P(B\mid A)P(A)}{P(B)}=\frac {P(B\mid A)P(A)}{P(B\mid A)P(A)+P(B\mid \neg A)P(\neg A)}\]
Example breast cancer screening:
Probability of cancer given positive test:
\[P(A\mid B)=\frac {0.87 \cdot 0.003}{0.87 \cdot 0.003+0.03 \cdot 0.997}=0.08=8\%\]
A (real-valued) random variable is a function from a sample space \(\Omega\) to \(\mathbb{R}\).
Examples:
The first two examples are discrete random variables, the last a continuous random variable.
The probabilities with which a random variable takes on specific values depends on the probabilities of the underling events.
The probability distribution can be described by:
probability mass function for discrete random variables
probability density function (pdf) or the cumulative distribution function (cdf) for continuous random variables
Example rolling a single die:
Example rolling two dice:
For a continuous random variables, the probability mass (1 in total) must be spread over a continuum. The pdf shows the “density of the probability” at a given location.
Probabilities are obtained by integration: \(P(Y\in[a,b])=\int_a^b f_Y(y)dy\)
For a given value \(y\), the cdf gives the probability that the random variable \(Y\) takes on a value \(Y \le y\)
\[F_Y(y)=P(Y \le y)=\int_{-\infty}^y f_Y(t)dt\]
ShristiV via Wikimedia Commons
The expected value \(\operatorname {E}(Y)\) of a random variable \(Y\) is its “average value”
It is calculated as a probability weighted average all the possible outcomes. For a discrete random variable that can take on \(k\) values:
\[\operatorname {E}(Y)=\sum_{i=1}^{k}y_ip_i\] where \(p_i=P(Y=y_i)\) (probability mass function)
For the outcome of a roll of a fair die: \(\operatorname {E} [Y]=1\cdot {\frac {1}{6}}+2\cdot {\frac {1}{6}}+3\cdot {\frac {1}{6}}+4\cdot {\frac {1}{6}}+5\cdot {\frac {1}{6}}+6\cdot {\frac {1}{6}}=3.5\)
In statistics, the expected value is often referred to as the mean or population mean of a random variable and denoted
\[\mu_Y=\operatorname {E}(Y)\] The variance of a random variable is a measure how widely it varies around its mean:
\[\sigma_Y^2=\text{Var}(Y)=\operatorname {E}\left[(Y-\mu_Y)^2\right]\]
For two random variables \(X\) and \(Y\) and an arbitrary number \(a\) it can be shown that:
\[\operatorname {E}(aX)=a\operatorname {E}(X)\] \[\operatorname{E}(X+Y)=\operatorname{E}(X)+\operatorname{E}(Y)\]
\[\text{Var}(aX)=a^2\text{Var}(X)\] For independent random variables \(X\) and \(Y\)
\[\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y)\] \[\text{Var}(X-Y)=\text{Var}(X)+\text{Var}(Y)\]
The Bernoulli distribution is the probability distribution of a random variable, say \(Y\), that takes on the value 1 (success) with a given probability \(\pi\) (success probability) and the value 0 (failure) with probability \(1-\pi\). The probability mass function is:
\[f(y)=\begin{cases} \pi \text{ , if } y=1 \\ 1-\pi \text{ , if } y=0 \end{cases} \]
Quiz: Can you find a formula for \(f(y)\) that fits on one line?
Solution: \(f(y)=\pi^y(1-\pi)^{(1-y)}\)
We can easily calculate the mean of a Bernoulli distributed value:
\[\mu=1\cdot \pi+0\cdot (1-\pi)=\pi\] .. and the variance
\[\sigma^2=(1-\pi)^2\cdot \pi+(0-\pi)^2\cdot (1-\pi)=\pi-\pi^2=\pi(1-\pi)\]
If \(n\) random variables \(Y_i\) (\(i=1,\ldots,n\)) are independent draws from the Bernouilli distribution with parameter \(\pi\), the random variable \(K=\sum_{i=1}^n Y_i\) follows a binomial distribution with parameters \(n\) and \(\pi\). The probability mass function of is given by:
\[f(k)=\text{Pr}(K=k)= {n\choose k} \pi^k(1-\pi)^{(n-k)}\] Here \({n \choose k}\) (read as ‘n choose k’) is the binomial coefficient and represents the number of different subsets of size \(k\) that can be drawn from \(n\) elements.
Let \(k\) be the number of individuals developing the disease in a cohort of \(n\) individuals assuming that the individual risk is \(\pi\). Below we assume \(\pi=0.2\) and \(n=20\)
The normal distribution is a frequently used distribution for continuous random variables. In statistics, we rely heavily on the normal distribution.
The pdf of of normally distributed random variable \(Y\) with mean \(\mu\) and variance \(\sigma^2\) is given by:
\[f(y)=\frac{1}{\sqrt{2\pi \sigma^2 }}e^{-\frac{1}{2}\frac{(y-\mu)^2}{\sigma^2}}\]
Means and variances of the distributions introduced so far:
| Distribution | Mean | Variance |
|---|---|---|
| Bernoulli | \(\pi\) | \(\pi(1-\pi)\) |
| Binomial | \(\pi\) | \(n\pi(1-\pi)\) |
| Normal | \(\mu\) | \(\sigma^2\) |